Methods and devices allowing enhanced interaction between a connected vehicle and a conversational agent

ABSTRACT

A method for processing at least one message sent by a conversational agent, the message including a plurality of data. The processing method is implemented by a processing device and includes: a step of receiving the at least one message; a step of determining at least one first datum to be vocalized and at least one second datum to be displayed, with the data being included in the plurality of data; a step of providing the at least one first datum and the at least one second datum for rendering.

1. FIELD OF THE DISCLOSURE

The disclosure relates to the field of telecommunications and morespecifically relates to the digital services provided for a user of avehicle.

2. PRIOR ART

A known standard is the GSMA “Rich Communication Suite” (RCS) standardthat defines an enhanced messaging protocol, evolving from the SMS andMMS services of telecommunications operators. The current standard(Universal Profile 2), via “Application to People” (or A2P)communications, allows a conversational agent (or “chatbot”) to interactwith a user. More specifically, the protocol describes a set of enhancedmessage formats that can be used by a conversational agent to presentinformation and actions to the user. These graphical and/or textualformats have been designed to facilitate conversational commerce whenthe user is equipped with a suitable terminal, for example, asmartphone, and their attention can be fully focused on the service.

However, within a mobility context, for example, when the user isdriving a vehicle, such messages are not entirely suitable. Indeed, insuch a context, the user has the communication interfaces of the vehicleat their disposal, which can be substantially different from those of asmartphone/tablet.

Therefore, a requirement exists for inventing new approaches forinteracting between a conversational agent and a vehicle that areadapted to the environment provided on board the vehicle (communicationinterfaces, ergonomics, etc.) and to the constraints associated withdriving.

3. SUMMARY

An aspect of the present disclosure relates to a method for processingat least one message sent by a conversational agent, said messagecomprising a plurality of data, said processing method being implementedby a processing device and characterized in that it comprises:

-   -   a step of receiving said at least one message;    -   a step of determining at least one first datum to be vocalized        and at least one second datum to be displayed, with said data        being included in said plurality of data;    -   a step of providing said at least one first datum and said at        least one second datum for rendering.

Advantageously, this method allows a conversational agent to interactwith a user via a graphical interface but also via a voice interface.

Specifically, the method retrieves a message from a conversational agentand then processes it in order to acquire at least one datum to bevocalized and at least one datum to be visually rendered to the user.Thus, when the gaze of the user is occupied, for example, looking at theroad within a driving context, the user can, via the vocalization,acquire additional information (first datum) allowing them to optimizethe dialogue with the conversational agent.

It should be noted that the data contained in the message can be data tobe rendered both vocally and graphically.

It also should be noted that the step of determining can include a firststep of analyzing the received message (i.e., an analysis of the dataincluded in the message, but also an analysis of the structure of themessage) and then a step of determining the data contained in themessage as having to be vocalized and/or displayed.

A conversational agent is understood to mean a computer dialogueautomaton capable of dialoguing with a user and/or a connected object.The conversational agent generates messages intended for theuser/connected object and interprets the responses sent back in order torespond to the requests/requirements of the client.

A message is understood to mean a set of data intended to be sent by acomputer system. The message can be textual in the form of a data streamor of one or more files.

According to a particular embodiment of the disclosure, a method asdescribed above is characterized in that the steps of determining and ofproviding are conditional upon the result of a step of detecting thetype of said at least one received message.

Advantageously, this embodiment means that the entire method does nothave to be implemented when the message received from the conversationalagent is not suitable. For example, when the received message is in XML(Extensible Markup Language) format, the method can process the messagein order to check for the presence of a particular tag indicating thetype of content of the message. Thus, implementing the steps ofdetermining and of providing is dependent on whether or not such a tagand/or its content is present.

For example, for RCS (Rich Communication Suite) type content, “richcard”, “suggested action”, “suggested reply”, “media, text”, “calendarevent”, “suggested chip list”, etc., type tags can be cited.

Alternatively and/or cumulatively, the step of detecting can correspondto the detection of the extension of a received file, i.e., theextension of the message received from the conversational agent. Thus,implementing the steps of determining and of providing is dependent onthe type of extension (for example, “.xml”, “.json”, “.exe”, etc.) ofthe received file.

According to a particular embodiment of the disclosure, a method asdescribed above is characterized in that the step of determiningcomprises a step of detecting the type of said data included in saidmessage.

Advantageously, this embodiment makes it possible to determine whether adatum is to be vocalized (first datum) and/or displayed (second datum)as a function of the nature of the datum. For example, when the messagereceived from the conversational agent is in XML (Extensible MarkupLanguage) format, the method can process the message in order to detectthe presence of a particular tag indicating the type/nature of the datumassociated with the tag. For example, for RCS type content, the“suggested replies” tag can be cited.

Alternatively and/or cumulatively, detection can occur as a function ofthe nature of the datum itself. For example, if the datum is anexecutable, the method can deduce therefrom that the datum is not adatum to be vocalized and displayed.

According to a particular embodiment of the disclosure, a method asdescribed above is characterized in that the step of providing isfollowed by a step of rendering said at least one first datum and atleast one second datum.

Advantageously, this embodiment allows the data determined as having tobe displayed and to be vocalized to be rendered to a user. Obviously, inthis case, the processing device must be capable of vocally and visuallyrendering the data.

According to a particular embodiment of the disclosure, a method asdescribed above is characterized in that the step of rendering isfollowed by a step of acquiring at least one third datum, calledresponse datum, from a user.

Advantageously, this embodiment allows a response to the receivedmessage to be acquired from the user. This response is acquired, forexample, via a voice command and/or an event originating from aninput/output peripheral such as a button, a thumbwheel, a tap on a touchscreen, etc.

According to a particular embodiment of the disclosure, a method asdescribed above is characterized in that the step of providing furthercomprises providing at least one fourth datum capable of beingvocalized, with said fourth datum being determined as a function of saidplurality of data, and in that the step of rendering further comprisesrendering said fourth datum.

Advantageously, this embodiment allows the vocalization of certain datato be contextually supplemented with elements (fourth datum) generatedas a function of the data that is contained in the received message.These data are, for example, “intentions” constructed as a function ofthe analysis of the structure of the message and/or of data previouslyrendered and/or to be rendered to the user by the processing device. Forexample, an “intention” can be a voice command that the user is likelyto enunciate and that the processing method must be able to interpret.

According to a particular embodiment of the disclosure, a method asdescribed above is characterized in that said fourth datum is determinedas a function of said at least one first datum and/or of said at leastone second datum.

Advantageously, this embodiment allows the vocalization of certain datato be contextually supplemented with elements (fourth datum) generatedas a function of the data that is determined as having to be displayedand/or vocalized.

For example, the method can analyze the message received from theconversational agent and determine that a datum to be vocalizedcorresponds to the first choice from a list of choices to be rendered tothe user. In this case, the method can add an element to be vocalized,such as the text “choice 1”.

The disclosure also relates to a device for processing at least onemessage sent by a conversational agent, said message comprising aplurality of data, characterized in that the device comprises:

-   -   a module for receiving said at least one message;    -   a module for determining at least one first datum to be        vocalized and at least one second datum to be displayed, with        said data being included in said plurality of data;    -   a module for providing said at least one first datum and said at        least one second datum for rendering.

It should be noted that the processing device can be a distributeddevice. Therefore, the modules can be distributed over several machinessuch as terminals, servers, etc., and can communicate with each othervia a communication network such as the Internet.

The disclosure also proposes a method for sending at least one item ofinformation relating to a conversational service, said method beingimplemented by a conversational agent and characterized in that itcomprises:

-   -   a step of creating a message comprising at least one first datum        identified as having to be visually rendered and at least one        second datum identified as having to be vocally rendered;    -   a step of sending said message to a processing device.

Advantageously, this method allows a conversational agent to generatemessages that include both data identified as having to be vocallyrendered and data identified as having to be visually rendered.

Thus, the method allows, in the case of a communication between aconversational agent and a vehicle, the driver to be provided with voicefunctionalities in addition to the visual presentation of elementsproposed by the conversational agent.

The disclosure further proposes a device for sending at least one itemof information relating to a conversational service, characterized inthat the device comprises:

-   -   a module for creating a message comprising at least one first        datum capable of being visually rendered and at least one second        datum capable of being vocally rendered;    -   a module for sending said message to a processing device.

The term module can equally correspond to a software component and to ahardware component or a set of hardware and software components, with asoftware component itself corresponding to one or more computer programsor sub-programs or, more generally, to any element of a program capableof implementing a function or a set of functions as described for therelevant modules. In the same way, a hardware component corresponds toany element of a hardware assembly capable of implementing a function ora set of functions for the relevant module (integrated circuit, chipcard, memory card, etc.).

The disclosure also relates to computer programs comprising instructionsfor implementing the above methods according to any one of theparticular embodiments described above, when said programs are executedby a processor. The methods can be implemented in various ways, inparticular in hardwired form or in software form. These programs can useany programming language, and can be in the form of source code, objectcode, or of intermediate code between source code and object code, suchas in a partially compiled form, or in any other desirable form.

The disclosure also relates to a computer-readable storage orinformation medium containing instructions of a computer program asmentioned above. The aforementioned recording media can be any entity ordevice capable of storing the program. For example, the medium caninclude a storage medium, such as a ROM, for example a CD-ROM or amicroelectronic circuit ROM, or even a magnetic recording medium, forexample a hard disk. Moreover, the recording media can correspond to atransmissible medium such as an electrical or optical signal, which canbe routed via an electrical or optical cable, by radio or by othermeans. The programs according to the disclosure particularly can bedownloaded over a network such as the Internet.

Alternatively, the recording media can correspond to an integratedcircuit in which a program is incorporated, with the circuit beingadapted to execute or to be used to execute the method in question.

This sending device, this processing device and these computer programshave similar features and advantages to those described above inrelation to the sending method and the processing method.

4. LIST OF FIGURES

Further exemplary features and advantages of aspects of the disclosurewill become more clearly apparent from reading the following descriptionof particular embodiments, which are provided by way of simpleillustrative and non-limiting examples, and from the accompanyingdrawings, in which:

FIG. 1 illustrates an example of an environment for implementing aparticular embodiment of the disclosure;

FIG. 2 schematically illustrates an example of the architecture of adevice adapted for implementing the processing method according to aparticular embodiment of the disclosure;

FIG. 3 illustrates the main steps of the processing method according toa particular embodiment of the disclosure;

FIG. 4 illustrates the main steps of the sending method according to aparticular embodiment of the disclosure; and

FIG. 5 schematically illustrates an example of the architecture of adevice adapted for implementing the sending method according to aparticular embodiment of the disclosure.

5. DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

FIG. 1 illustrates an example of an environment for implementing one ormore aspects of the present disclosure according to a particularembodiment. More specifically, FIG. 1 illustrates a terminal 100 capableof implementing the processing method according to a particularembodiment. It should be noted in this case that, although the terminal100 illustrated in FIG. 1 in this case corresponds to a terminal of the“on-board computer” type of a car, aspect of the disclosure can beapplied to any type of terminal having a screen and an audio moduleincluding, for example, a loudspeaker and/or a microphone, such as, forexample, and in a non-limiting manner, a tablet, an electronic reader, agames console, a television, an automated teller machine,terminals/connected objects or even a personal computer.

In the example described in support of FIG. 1 , the terminal 100 is anon-board car computer having a touch screen and an audio module adaptedfor visually and vocally interacting with a user present in thepassenger compartment of the vehicle. In a conventional manner, the userof the terminal 100 can command the execution of an operation byperforming an action on an area of the screen of the terminal 100. Tothis end, the terminal interprets the action of the user as a functionof what is displayed on the screen and, more specifically, of thelocation where the action is performed. The action can be, for example,a tap (a brief contact made on the screen), a double tap, a long tap, adrag-and-drop, a gesture made in contact with the screen representing,for example, a signature, or any other action involving contact with thescreen.

Similarly, the user can command the execution of an operation via avoice command. To this end, the terminal 100 interprets (voicerecognition) the command as a function of the context and/or of thepreviously vocalized elements.

In the example described in support of FIG. 1 , the terminal 100communicates with a conversational agent (not shown) and acquires amessage comprising data to be displayed and vocalized. The communicationcan be wired, for example, via PLC (Power Line Communication) whenrecharging the battery of the vehicle when the vehicle is an electricvehicle, or even can be wireless via, for example, Bluetooth®, Wi-Fi®and/or cellular radiotelephony technologies.

In this example, the conversational agent is a service provided by aparking manager. The conversational agent collects parking fees, forexample. The data included in the message can correspond to:

-   -   an image, such as the logo of the parking management company        (not shown);    -   a welcome text with an invitation to choose the length of stay;    -   a collection of possible answers that present the range of        accepted durations and their associated price.

It should be noted that the message received from the conversationalagent can be displayed on the vehicle screen, replacing the previouscontent (radio frequency, music title, road navigation map, etc.).

According to a particular embodiment, an audible signal isplayed/rendered in order to indicate the arrival of the message.

According to a particular embodiment, the user is only notified of thearrival of the message, and an action on their part is required in orderto trigger the display.

Subsequently, the user can make their choice by using their finger toselect the area of the touch screen that corresponds to theirrequirement or even by using a “thumbwheel” (vehicle control unit) tomove around the graphical interface and select the graphical element ofinterest.

Cumulatively, the terminal 100 analyzes the received message in order todetermine the elements to be vocalized.

FIG. 2 illustrates a device 200 configured for implementing theprocessing method according to a particular embodiment.

According to a particular embodiment of the disclosure, the device 200has the conventional architecture of a computer (on-board computer of avehicle), and particularly comprises a memory MEM, a processing unit UT,equipped with a processor PROC, for example, and controlled by thecomputer program PG stored in the memory MEM. The computer program PGincludes instructions for implementing the steps of the processingmethod as described above, when the program is executed by the processorPROC.

Upon initialization, the code instructions of the computer program PGare loaded, for example, into a memory before being executed by theprocessor PROC. In particular, the processor PROC of the processing unitUT implements the steps of the processing method according to any one ofthe particular embodiments described in relation to FIGS. 1 and 3 ,according to the instructions of the computer program PG.

The device 200 comprises a module COM configured to establishcommunications with a network, for example an IP network and/or acircuit. This module can be used to communicate (send and receivemessages) with a conversational agent, for example when they are inclose proximity.

The device 200 also comprises a module TRIG capable of determining, in amessage acquired via the module COM from a conversational agent, thedata to be vocalized and visually rendered to a user.

The device 200 further comprises a module GIVE capable of providing theuser with the data to be vocalized and visually rendered. The provisioncan occur via the network. The method then sends the data to bevocalized and visually rendered to a terminal (for example, a smartphoneof the user) or to a server.

According to a particular embodiment, the module GIVE and the module COMare one and the same module.

Alternatively and/or cumulatively, they can be made available locally.The device 200 can include, for example, a display module (DISP) adaptedfor displaying the data determined by the module TRIG as having to bevisually rendered. The device can further comprise a module AUD adaptedfor rendering, by means of sounds, via a loudspeaker, for example, themessage data determined as having to be vocalized.

According to a particular embodiment, the module AUD comprises a voicerecognition module capable of interpreting voice commands enunciated bya user and detected, for example, via a microphone and then oftriggering an associated action as a function of the enunciated voicecommand.

According to a particular embodiment, the module GIVE and the module AUDare one and the same module.

In a particular embodiment, the module GIVE and the module DISP are oneand the same module.

FIG. 3 illustrates steps of the processing method according to one ofthe particular embodiments of the disclosure presented above, with themethod being executed on the terminal 100 described in FIG. 1 .

During a first step 300, the vehicle that integrates the terminal 100enters a parking area. The method then receives a message originatingfrom a conversational agent of a parking manager. The message includes aplurality of data to be vocalized and displayed for the attention of auser present in the passenger compartment of the vehicle.

In step 301, the message is analyzed by the method. Specifically, themethod processes/scans the structure of the message (for example, an XMLor JSON tree) and determines the data to be vocalized and/or displayed.

Vocalization is a known method and is most often used to respond toaccessibility requirements. The disadvantage of such a method is a pooruser experience, when the user undertakes to describe the whole of anelectronic document including images, for example. In the case describedin support of FIG. 1 , the requirement is different. Indeed, it does notinvolve vocalizing the whole message, but determining the relevantsubsets to be vocalized.

The idea is to supplement the visual presentation with a suitabledescription of the received message via a voice synthesis. Thus, even ifthe visual attention of the driver is occupied with their driving, theycan still take note of the message provided by the conversational agent.

In the case of the message described in support of FIG. 1 , the methodcan determine:

-   -   that no special processing needs to be provided regarding the        logo/image, i.e., no vocalization;    -   that the welcome text is a datum to be vocalized and/or        displayed;    -   that the collection/list of possible answers/choices (15 minutes        (101 a), one hour (101 b), two hours (101 c), etc.) are also        data to be vocalized and/or displayed.

The method can determine the data to be vocalized and displayed as afunction of the nature of the datum. For example, when the datumcorresponds to text, the method can consider that the datum is to bevocalized. Similarly, if the datum is an image, the method can considerthat this datum is to be displayed only.

According to a particular embodiment, the method can stop when itdetects that the message is not in the correct format. Indeed, if themessage is an executable message/file, the method can consider that thedata included in the message is neither to be vocalized nor to bedisplayed.

According to a particular embodiment, the method can also add elementsto be vocalized in addition to the data to be vocalized that is includedin the received message. For example, when the method detects thepresence of a list of choices in the message, it can add the text“please choose” or even a text “choice X”, with X being the position ofthe choice in the list of choices. Obviously, such elements to bevocalized are added in such a way that the rendering can be understoodby the user. For example, the text “please choose” is added just beforethe list of choices.

According to a particular embodiment, a datum relating to a runningorder for the vocalization is associated with each datum and/or elementto be vocalized.

The method can also determine and/or generate “intentions” as a functionof the content of the message. An “intention” is a voice command thatthe user is likely to enunciate and that the processing method must beable to interpret. For example, when a text is vocalized, an intentionis generated allowing the vocalization of the text to be repeated whenrequested by the user. Specifically, when the user says the “repeat”voice command (generated intention) the text is re-vocalized. In thesame way, a “what are the choices?” intention can be generated by themethod in order to allow vocalization of the list of choices.

It should be noted that the construction of these intentions can callupon enhanced inferences, for example, in order to take into account anenunciation such as “a quarter of an hour” as relevant for the choice of“15 minutes” (101 a).

During step 302 the method provides the data determined as having to bevocalized and/or displayed. These data are then processed by a displaydevice and a voice synthesis/recognition (step 303).

The rendering is suitably carried out so that the driver and/or thepassengers can take note of the message and of the choices of suggestedresponses in a seamless and entirely safe manner.

Once the data has been rendered, the user is able to interact with theterminal 100 via the touch screen and/or the microphones located in thepassenger compartment of the vehicle (response data within the meaningof an embodiment of the disclosure).

It should be noted that recent vehicles are generally equipped with acomputer system that is often provided with voice dialogue capabilities:sound sensors effectively disposed in the passenger compartment, with asound capture system optimized to reduce parasitic noise, a speechrecognition module, a speech synthesis module, sound renderingequipment, an activation button on the steering wheel. Typically, theoperating system of the vehicle uses these means to allow the user tointeract with the various on-board services. The speech recognitionmodule can be backed up by an NLP (Natural Language Processing) systemcapable of classifying recognized content in order to assign anintention thereto.

During step 302 the method can also provide the intentions associatedwith the received message. This allows the voice synthesis device/moduleand the speech recognition device to interact with the user moreseamlessly. Indeed, as the possible actions (voice commands) of the userare already determined, they are interpreted faster.

According to a particular embodiment, the method uses a voice synthesisthat is different from the one usually used in the vehicle, so that theusers identify the conversational agent with a service different fromthose usually used on board.

According to a particular embodiment, the message received during step300 is a structured message using a markup language (for example, XML,HTML, etc.). The message includes one or more particular tags, theassociated content of which is identified as having to be vocalized.

According to a particular embodiment, the message received during step300 is a structured message using a markup language (for example, XML,HTML, etc.). The message includes one or more particular tags, theassociated content of which is identified as being one or moreintentions.

In combination with the processing method, the disclosure proposes asending method executed by a device capable of implementing aconversational agent. This device can be, for example, and in anon-limiting manner, a tablet, a connected screen, an automated tellermachine, terminals/connected objects, a server and/or a computer.

FIG. 4 illustrates steps of the sending method according to a particularembodiment of the disclosure.

In step 400, the sending method generates a message including data to bevocalized and data to be displayed.

According to a particular embodiment, the message includes one or moreparticular tags, the associated content of which is identified as havingto be vocalized and/or displayed.

According to a particular embodiment, the message includes one or moreparticular tags, the associated content of which is identified as beingone or more intentions.

Once the message has been created, the sending method sends the messageto a processing device, such as an on-board computer of a connected car.

FIG. 5 illustrates a device 500 configured for implementing the sendingmethod according to a particular embodiment.

According to a particular embodiment of the disclosure, the device 500has the conventional architecture of a computer (conversational agent),and particularly comprises a memory MEM1, a processing unit UT1,equipped with a processor PROC1, for example, and controlled by thecomputer program PG1 stored in the memory MEM1. The computer program PG1includes instructions for implementing the steps of the processingmethod as described above, when the program is executed by the processorPROC1.

Upon initialization, the code instructions of the computer program PG1are loaded, for example, into a memory before being executed by theprocessor PROC1. In particular, the processor PROC1 of the processingunit UT1 implements the steps of the sending method according to any oneof the particular embodiments described in relation to FIGS. 1 to 4 ,according to the instructions of the computer program PG1.

The device 500 comprises a module CREA capable of generating messagesincluding data to be vocalized and data to be displayed. The messageuses, for example, a structured language such as XML or JSON.

The device 500 further comprises a module SND configured forestablishing communications with a network, for example an IP networkand/or a circuit. This module can be used to communicate with anon-board computer of a vehicle when the device is located in thevicinity thereof (sending a message generated by the module CREA to thevehicle).

It is obvious that the embodiment described above has been providedsolely for indicative and non-limiting purposes, and that numerousmodifications can be easily made by a person skilled in the art, yetwithout departing from the scope of the disclosure.

For example, aspects and embodiments of the disclosure also can beapplied to a battery charging service for an electric vehicle.Specifically, when the vehicle is electrically connected to an electricrecharging terminal, the conversational agent running at the terminalsends a message to the dashboard of the vehicle via PLC technology, forexample. The sent message can include information to be vocalized and/ordisplayed, such as tariffs associated with recharging time choices, awelcome text, etc. Once the message is received by the dashboard of thevehicle, the processing method determines the information to bevocalized and the information to be visually rendered. The user presentin the passenger compartment of the vehicle is thus able to choose therecharging and payment methods without having to leave the vehicle.

1. A method for processing at least one message sent by a conversationalagent, said message comprising a plurality of data, said method beingimplemented by a processing device and comprising: receiving said atleast one message; determining at least one first datum to be vocalizedand at least one second datum to be displayed, with said data beingincluded in said plurality of data; and providing said at least onefirst datum and said at least one second datum for rendering.
 2. Themethod according to claim 1, wherein the determining and providing areconditional upon a result of detecting a type of said at least onereceived message.
 3. The method according to claim 1, wherein thedetermining comprises detecting a type of said data included in saidmessage.
 4. The method according to claim 1, wherein the providing isfollowed by rendering said at least one first datum and said at leastone second datum.
 5. The method according to claim 4, wherein therendering is followed by acquiring at least one third datum, calledresponse datum, from a user.
 6. The method according to claim 4, whereinthe providing further comprises providing at least one fourth datumcapable of being vocalized, with said fourth datum being determined as afunction of said plurality of data, and wherein the rendering furthercomprises rendering said fourth datum.
 7. The method according to claim6, wherein said fourth datum is determined as a function of said atleast one first datum and/or of said at least one second datum.
 8. Adevice for processing at least one message sent by a conversationalagent, said message comprising a plurality of data, wherein the devicecomprises: a processor; and a non-transitory computer readable mediumcomprising instructions stored thereon which when executed by theprocessor configure the device to: receive said at least one message;determine at least one first datum to be vocalized and at least onesecond datum to be displayed, with said data being included in saidplurality of data; and provide said at least one first datum and said atleast one second datum for rendering.
 9. A non-transitory computerreadable medium comprising a computer program stored thereon comprisinginstructions for implementing a method for processing at least onemessage sent by a conversational agent, when the program is executed bya processor, the message comprising a plurality of data, the methodcomprising: receiving said at least one message; determining at leastone first datum to be vocalized and at least one second datum to bedisplayed, with said data being included in said plurality of data; andproviding said at least one first datum and said at least one seconddatum for rendering.